19 research outputs found

    Application of Probabilistic Ranking Systems on Women’s Junior Division Beach Volleyball

    Get PDF
    Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three probabilistic/modern ranking techniques were tested, specifically an Elo variant, TrueSkill, and a random walker graph network. This study found that Elo could predict match outcomes with a 13% higher accuracy than the preexisting systems and TrueSkill with an 11% higher accuracy

    A Hybrid Ensemble of Learning Models

    Get PDF
    Statistical models in time series forecasting have long been challenged to be superseded by the advent of deep learning models. This research proposes a new hybrid ensemble of forecasting models that combines the strengths of several strong candidates from these two model types. The proposed ensemble aims to improve the accuracy of forecasts and reduce computational complexity by leveraging the strengths of each candidate model

    Extending the M3-Competition: Category and Interval-Specific Time Series Forecasting

    Get PDF
    The M3-Competition found that simple models outperform more complex ones for time series forecasting. As part of these competitions, several claims were made that statistical models exceeded machine learning (ML) techniques, such as recurrent neural networks (RNN), in prediction performance. These findings may over-generalize the capabilities of statistical models since the analysis measured the total forecasting accuracy across a wide range of industries and fields and with different interval lengths. This investigation aimed to assess how statistical and ML methods compared when individuating series by category and time interval. Utilizing the M3 data and building individual models using Facebook© Prophet and R packages: tswge, forecast, and nnfor, there were significant differences in model performance. The statistical models performed better for monthly – industry, macro, and micro combinations (Wilcoxon signed-rank adjusted p-value \u3c 0.0001) for short-term forecast horizons (h=5). However, the multilayer perceptron (MLP) surpassed the statistical models in quarterly – industry data (p-value \u3c 0.001) for the same forecast length. The statistical models also outperformed ML methods for long-term forecasts in the same category by interval combinations (p-value \u3c 0.01). Thus, identifying which model may have increased performance in specific category, interval and horizon combinations provides direct value for time series analysis

    Demand Forecasting for Alcoholic Beverage Distribution

    Get PDF
    Forecasting demand is one of the biggest challenges in any business, and the ability to make such predictions is an invaluable resource to a company. While difficult, predicting demand for products should be increasingly accessible due to the volume of data collected in businesses and the continuing advancements of machine learning models. This paper presents forecasting models for two vodka products for an alcoholic beverage distributing company located in the United States with the purpose of improving the company’s ability to forecast demand for those products. The results contain exploratory data analysis to determine the most important variables impacting demand, which are time of year and customer. For each of the two products, models were built to predict demand for three major customers. For each product/customer combination, this paper compares time series and deep learning models to a naive model to see if the prediction accuracy can be improved. For five out of six products, the time series models reduced error by 2.5–66.7% compared to the naive models. Also, for one product, a hybrid CNN model developed for this paper outperformed the time series models by 3–10% and reduced error by 49% compared to the naive models

    Traditional vs Machine Learning Approaches: A Comparison of Time Series Modeling Methods

    Get PDF
    In recent years, various new Machine Learning and Deep Learning algorithms have been introduced, claiming to offer better performance than traditional statistical approaches when forecasting time series. Studies seeking evidence to support the usage of ML/DL over statistical approaches have been limited to comparing the forecasting performance of univariate, linear time series data. This research compares the performance of traditional statistical-based and ML/DL methods for forecasting multivariate and nonlinear time series

    Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach

    Get PDF
    In this paper, historical data from a wholesale alcoholic beverage distributor was used to forecast sales demand. Demand forecasting is a vital part of the sale and distribution of many goods. Accurate forecasting can be used to optimize inventory, improve cash ow, and enhance customer service. However, demand forecasting is a challenging task due to the many unknowns that can impact sales, such as the weather and the state of the economy. While many studies focus effort on modeling consumer demand and endpoint retail sales, this study focused on demand forecasting from the distributor perspective. An ensemble approach was applied using traditional statistical univariate time series models, multivariate models, and contemporary deep learning-based models. The final ensemble models for the most sold product and highest revenue grossing product were able to reduce sales forecasting error by nearly 50% and 33.5%, respectively, in comparison to a statistical naive model. Additionally, this paper determined that there is no one size fits all demand model for all products sold by the distributor; each product needs an individually tuned model to meaningfully reduce error

    Examining Bias in Jury Selection for Criminal Trials in Dallas County

    Get PDF
    One of the hallmarks of the American judicial system is the concept of trial by jury, and for said trial to consist of an impartial jury of your peers. Several landmark legal cases in the history of the United States have challenged this notion of equal representation by jury—most notably Batson v. Kentucky, 476 U.S. 79 (1986). Most of the previous research, focus, and legal precedence has centered around peremptory challenges and attempting to prove if bias was suspected in excluding certain jurors from serving. Few studies, however, focus on examining challenges for cause based on self-reported biases from the venire, the group of potential jurors. This paper evaluates if there are any relationships of interest with respect to juror demographics and location regarding challenges for cause in non-death penalty felony criminal trials in Dallas County, TX

    Modeling Electric Energy Generation in ERCOT during Extreme Weather Events and the Impact Renewable Energy has on Grid Reliability

    Get PDF
    This paper shows the inadequacy of current grid backup supplies, their ongoing threat to grid reliability, and the increased risk of customer blackouts. This paper also examines reliability concerns caused by renewable energy sources under more normal operating conditions. This paper seeks to model the impacts of extreme weather on ERCOTs grid. Unusual weather patterns require that the electrical grid operates in conditions outside normal operating parameters. Electrical system demand can spike well above normal levels. At the same time, weather influences can change the operating envelope of electrical generation equipment. The ERCOT grid has changed from its’ traditional mix of conventionally powered generation, including gas, coal, and nuclear. The recent addition of significant renewable energy generation, solar and wind, poses a new set of challenges to grid reliability and energy availability. Using time-series analysis, this paper provides methods to model expected electrical demand for extreme events. Furthermore, this paper will explore what limitations on renewable energy are necessary considering grid reliability requirements. This paper shows that the grid currently operates with inadequate spinning reserve, an ongoing threat to grid reliability, that risks forced blackouts for customers. Modeling Electric Energy Generation in ERCOT during Extreme Weather Events and the Impact Renewable Energy has on Grid Reliabilit

    Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets

    Get PDF
    Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool capable of simulating classifier models for these dataset characteristics and performance metrics such as true positive rate, false positive rate, and accuracy under specific conditions. We found that when increasing the variance in the explanatory and noise variables, logistic regression consistently performed with a higher overall accuracy as compared to random forest. However, the true positive rate for random forest was higher than logistic regression and yielded a higher false positive rate for dataset with increasing noise variables. Each case study consisted of 1000 simulations and the model performances consistently showed the false positive rate for random forest with 100 trees to be statistically different than logistic regression. In all four cases, logistic regression and random forest achieved varying relative classification scores under various simulated dataset conditions

    Application of Probabilistic Ranking Systems on Women’s Junior Division Beach Volleyball

    Get PDF
    Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three probabilistic/modern ranking techniques were tested, specifically an Elo variant, TrueSkill, and a random walker graph network. This study found that Elo could predict match outcomes with a 13% higher accuracy than the preexisting systems and TrueSkill with an 11% higher accuracy
    corecore